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Coronavirus disease 19 (COVID-19), a disease caused by severe acute 
respiratory syndrome-coronavirus-2 (SARS-CoV-2), began as the flu and 
gradually developed into a highly infectious global pandemic leading to the 
death of over 6 million people in about 200 countries of the world. Its 
pathogenic nature has qualified it as a deadly disease, causing moderate and 
severe respiratory difficulty in infected individuals with the ability to mutate 
into different variants of the first version. As a result, different government 
agencies and health institutions have sought solutions within and outside the 
clinical space. This paper models COVID-19 possible recurrence as variants 
and predicts that the subsequent waves will be more severe than the first wave. 
Long short-term memory network (LSTM) was used to predict the future 
occurrence of COVID-19 and forecast the virus's pattern. Machine evaluation 
was performed using precision, recall, Fl-score, an area under the curve 


(AUC), and accuracy evaluation metrics. Datasets obtained were used to test 
the data. The collected characteristics were passed on to the system 
classification network, demonstrating the function's value based on the 
system's accuracy. The results showed that the COVID-19 variants have a 
higher disastrous effect within three months after the first wave. 
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1. INTRODUCTION 

The coronavirus called COVID-19 began as the flu and gradually developed into a global pandemic 
that has taken many lives with millions of infected persons. Some studies have shown that the virus has a 
particular pattern and that these patterns are dependent on the epidemic's complex transmission [1]-[3]. To find 
and assess certain infectious diseases, various methods were used to study epidemic rises under some 
conditions such as weather, region, and spread of the virus over time [4]-[6]. In December 2020, the United 
Kingdom (UK) and the Republic of South Africa authorities announced a version referred to as Corona 
VOC202012/01 and 501 Y.V2. Another variant was then announced in Tanzania which was killing more people 
than the first variant that broke out in China. It is unclear how and when Corona VOC202012/01 emerged [7]. 
Corona VOC202012/01 was found from routine sampling and genomic testing. Tentative epidemiological, 
modeling, phylogenetic, and clinical findings suggest that transmissibility has been enhanced by Corona 
VOC202012/01. Preliminary research also revealed no difference in the incidence or frequency of diseases or 
reinfection among variant cases relative to other Corona VOC202012/01 [8]-[10]. 
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The UK Corona VOC202012/01 has the phylogenetic analysis of South Africa's 501 Y.V2 mutation, 
which revealed that South Africa's 501Y.V2 were distinct types. The new variants have essentially displaced 
SARS-CoV-2 viruses. Testing indicated that the variants were associated with a higher viral load resulting in 
higher transmissibility capacity. However, there was no firm evidence of more severe diseases related to the 
variants [8]. COVID-19 data are time sequences and sequential models have been widely supported to cope 
with their dynamic existence. In addition, the long short-term memory of recurrent neural networks was 
suggested as an appropriate tool for its analysis [11]. The reported study aimed to classify confirmed cases, 
mortality, and recovered cases of COVID-19 variants and construct a COVID-19 predictor to determine future 
patterns [12]. The analysis is based on a dataset of cases identified and reported up to 15 December 2020 for 
seven African countries. Many researchers have forecast the current coronavirus distribution, but the virus is a 
data collection of real-time series, which are complex. Therefore, sequential networks, statistical, and 
epidemiological models are good recommendations for its analysis [13], [14]. According to researchers, low 
specific humidity has been discovered to be a critical component in influenza laboratory transmission and the 
onset of seasonal influenza in the United States. 

Previous studies have analyzed COVID-19 using artificial intelligence and machine learning 
techniques. Multilayer perceptron (MLP) and adaptive network-based fuzzy inference method (ANFIS) were 
used to determine the complicated behavior [15] of variance and forecast the spread of COVID-19. Reported 
cases were projected with the number of infected persons using a hybrid of support vector regression (SVR) and 
autoregressive integrated moving average (ARIMA) [16]. A radial base function (RBF) SVR model was also 
used to predict regular, recovered, and death cases [17], [18]. Another research used developed SVR and random 
forest (RF) ensemble predictors to predict patient numbers before being hospitalized [19], [20]. The deep 
learning algorithm demonstrated a vital role in studying and predicting enormous outbreak data patterns and 
helped prevent coronavirus's high spread in early exploitation [21]. Based on the research that has been carried 
out by different researchers, in this study, long short-term memory network (LSTM) is being proposed as an 
analyzing tool for COVID-19 due to its time-series nature. LSTM is a deep learning model with the capability 
of handling long-term dependencies. In machine learning, complicated problems are usually solved by gathering 
necessary data to give excellent output [22]-[25]. The rest of this paper is organized as follows, section 2 gives 
an insight into the method, section 3 shows the result gotten and section 4 gives a full discussion and conclusion. 


2. METHOD 

LSTM network is composed of memory blocks connected by layers to build more complex recurrent 
networks. The block contained in the network contains an intelligent component that makes it more desirable 
than a classic neuron and memory chain. In LSTM, there is also a modular chain and memory block mainly 
designed to store data longer. In addition, LSTMs contain modules that are linked together, more complex 
structures for repeated modules, and three different multiplicative units called ‘gates.’ With LSTM, long-term 
dependencies are possible, just as there is the tendency for long-time information recovery to provide excellent 
solutions to complex sequence problems. The proposed LSTM is made to recall every bit of knowledge over 
time to predict time series with the capability to identify and recognize previous input. It could be used to 
transfer intermediate information, with internal and external reference to feedback, while they can encode the 
time context. For the fact that feedback can both be internal and external, LSTM could learn patterns from 
records and generalize and forecast future virus cases. Hence, information is moved easily through the cells 
without any change. 

It should be noted that LSTM learns more easily from long-term dependence. The gates of input, forget, 
and feeds input flows into the constant error carious (CEC) cell, process information, and output streams to the 
rest of the networks by the cells. Algorithm 1 depicts using LSTM for predicting the reoccurrence of COVID-19. 


Algorithm 1. An LSTM recurrent neural network for COVID-19 variants’ analysis 


Input: HEB Aano My) 

Output: Reoccurrence Prediction (RP) 

Parameters: (W,U Bı W, UB. Wg UB W, Us Bo) 

Process: 

Step 1: Initialize B, C, = 0 

Step 2: Fort = 1............... 360 do 

Step 3: Calculate F; (I, = o (X, Wag + Hp. Wi-1 + Bo) (Ot = 0 (Xp Wag + Hey + Way) (Cy = Fy O Ce-i + 
Ie OCz)) 

Step 4: Update cell state C; (H; = O+ © tan A (C+)) 

Step 5: Calculate C; (Ry = 0 (Xe * U, + Hi1 * Wp) (Ut = 0 (X,* Un + Hi- *W, 

Step 6: End for 

Step 7: Reoccurrence prediction 

Step 8: End. 
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In this study, a time series, the linear model is built using recurrence neural networks (RNNs) and 
LSTM blocks. Memory blocks, a key element of LSTM networks, are created to combat fading gradients 
through long-term memorization of network properties. Memory blocks in the LSTM architecture resemble 
differential storage systems in digital computers. With the aid of the activation sigmoid function, gates in 
LSTM process the data, and the output ranges from 0 to 1. To transfer only positive values to the next gates to 
get an outcome. The (1) here more details about LSTM, 

Zt = O(Q¢-[P, — 1,Xe]) + Or, a) 


where the weight matrix and the bias of the forgotten direction are collectively P, Z, and O, and the sigmoid 
function is o, 


I, = o(W.[P, — 1, X¢]) + O;, (2) 
where I, is the function of the input gate while o is the output gate and 
c = tanh (W,[P, — 1, X:]) + Oe, (3) 


determines the component that makes it output from the current cell state (Schmidhuber and Hoch Reiter 1997). 
Figure 1 shows the architecture model used in this study. 
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Figure 1. Proposed LSTM model 
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Data was collected from Africa center for diseases and Nigeria center for diseases and were stored on 
the system, which served as the database. The data was then retrieved from the database and was preprocessed 
by cleaning and checking for null values, and information not needed was then discarded. The data 
standardization came into play by checking for 0 s and 1 s, strings, and the character in the data since they were 
raw data from the websites mentioned. The data was divided into two: training and testing data. The training 
data took 70% and the testing data took 30%. LSTM was applied to the training data to measure the model's 
performance and accuracy, and the result was generated. The model was then retrained by hyper-turning the 
data when the LSTM performance was low due to a system factor or error in the data. If the model performed 
well in the first trial, it then proceeds to test the model by using the remaining 30% of the data. The system 
performance is being measured by using evaluation metrics. 


3. RESULTS AND DISCUSSION 
The data set was trained and modeled using the Python library and other necessary editor tools. The 
APIs for the recurrent neural network which is LSTM construct the existing model configuration. The 
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dependent data structure was described and mapped to the current learning sequence by the model to forecast 
the number of reported cases in any given country. Using the models, the data structure change view shows 
country by the country daily occurrence of COVID-19 data history. This model's hyper-parameter tuning is 
performed rigorously. The dataset used for the study was gathered from six African countries via the center for 
diseases control. Other criteria considered in the dataset are the average number of female infections to the 
number of males infected. In addition, the age range for each individual will help make a good prediction. The 
training set is supplied to the LSTM network as a vector. Goals are compared to these metrics, and the weights 
used to modify the procedure are changed. The sample signals are subsequently transmitted online, where the 
aim values are determined using the weighted data. The map of Africa is represented in Figure 2, and the 
countries considered in the study are colored in yellow. Table 1 shows the data collected, and the data for each 
of the African countries selected are represented in Figures 2(a) and (b). The areas indicated shows the most 
affected countries in the African continent, and they are the countries selected for the study. 
Figure 2(a) shows the first wave of COVID-19 while Figure 2(b) shows the second wave of COVID-19, and 
Table 1 shows the total number of cases recorded, death, and recovered persons. 


Table 1. Summary of the data collected for the training and testing of the model 


Countries Date Cases Recovered Death 1 
Nigeria 3 March- 15 July 34,259 13,466 760 
Egypt 15 February-15 July 84,843 24,433 4,067 
South Africa 6 March-15 July 311,049 13,495 4,453 
Kenya 14 March-15 July 11,252 2,905 209 
Morocco 3 March-15July 16,262 1,229 212 
Madagascar 22 March-15 July 5,605 2,388 43 


Confirmed: 62,590 Condemed 386,570 
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Figure 2. Map of the African continent, indicating with arrows the observation area for this study 
(a) first wave of COVID-19 and (b) second wave of COVID-19 


The table summarizes the data collected for each country for the first wave of COVID-19 from the 
date it was first recorded in the countries. From the table, South Africa and Egypt have a high number of 
infected people and also a high number of fatalities than Nigeria follows. Since the data recorded was divided 
into 70% and 30% for the model's training and testing, March to May data recorded is used for the model's 
training, while June and July data is used for the (30%) testing. The high increase in cases can be linked to the 
stated countries' population, environment and human negligence or ignorance of health rules, also the 
regenerating of the virus in a more complicated genome in Table 2. The 2" wave of the virus has a high 
tendency of infecting more population than the first wave, also it fights against the vaccination injected in the 
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ody in some cases. The 0’s and 1’s used in the table represents the ‘high’ and ‘very high’of each of the virus 
variant for each country. Table 3 show the results of some sample on the duration of recovery. 


Table 2. First wave and second wave of COVID-19 


Country First Wave Second Wave Mortality 
First Wave Second Wave 
Nigeria 0 1 0 1 
South-Africa 1 1 1 0 
Egypt 1 1 1 0 
Kenya 1 0 1 0 
Morocco 1 0 0 1 
Madagascar 0 1 1 0 


Table 3. Recovery rate and mortality rate recorded by the end of the first and second wave of COVID-19 
Samples Confirm Cases Death _ Recovery __ Recovery Rate _ Mortality Rate 


3006 7719 4 372 47.7635 0.513479 
2542 745 6 340 45.6376 0.805360 
2068 684 5 431 63.0117 0.730994 
3058 661 6 143 20.0985 0.734214 
1809 675 he 230 34.0741 1.03704 
2504 667 12 274 41.0795 1.7991 

2987 663 4 166 25.0377 0.603318 
2523 661 19 137 20.7262 2.87443 
2047 649 9 276 42.3720 1.36675 
2032 627 12 397 63.3174 1.91366 
2935 594 7 209 35.1852 1.17845 
2207 587 14 344 58.6031 2.36601 

2079 573 4 129 22.5131 0.69808 
1953 566 8 395 69.788 1.41343 
4205 561 17 344 61.3191 3.0303 

2460 501 8 201 40.1198 1.59661 

2140 490 T 382 77.9592 1.42857 
7397 490 31 274 55.9164 6.32653 
2539 452 8 229 50.6637 1.76901 


4. CONCLUSION 

Since the outbreak of the COVID-19 pandemic, its discussion has dominated the global space, and 
solutions to its continued occurrence have been sought within and outside clinical parlance. This is important 
for health and government agencies to return their citizens to normal life activities after months of lockdowns. 
This paper used an RNN combined with LSTM to train, model, and predict the possible rise in COVID-19 
occurrences. Using the LSTM-based model, it was possible to predict the virus prolonging at certain points 
with minimal prediction error, increased learning capacity, and higher precision. COVID-19 data curves from 
six African countries, Nigeria, South Africa, Egypt, Kenya, Morocco, and Madagascar were used as input data 
for the proposed RNN LSTM model. 

In contrast, the dataset retrieved from the center for disease control was used to test the developed 
model. The developed predictive model is observed to be capable of capturing past data properties of time 
variants and related trends and forecasting the COVID-19 trend. It provides an effective means to aggregate 
the number of future cases of COVID-19 infections. This could serve as an alternate model for estimating the 
number of weather-based incidents that can be used to forecast such a pandemic's economic implications. It 
could also serve as a solid basis for a more accurate LSTM-based model in the nearest future for predicting the 
future occurrence of this form of the pandemic. The developed model could also be adapted to predict future 
cases of interest like fatality and morbidity. The model is expected to be adopted by relevant authorities in 
making informed decisions on prevention measures and policies. The forecast results would help to provide 
and organize adequate medical facilities and ensure that people's economic activities are not unnecessarily 
jeopardized for a long time. 
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